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(57) Abstract: A method and system for removing acoustic noise removal (Fig, 5) firom 
human speech is described. Acoustic noise is removed regardless of noise type, ampli- 
tude, or orientation. The system includes a processor (30) coupled among microphones 
(1,2) and a voice activation detection ("V AD") element (104). The processor executes 
denoising algorithms that generate transfer functions. The processor (30) receives acous- 
tic data from the microphones (1,2) and data from the VAD (104) indicates voicing ac- 
tivity and when the VAD indicates no voicing activity. The transfer functions are used to 
generate a denoised data stream. 
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METHOD AND APPARATUS FOR REMOVING NOISE FROM 
ELECTRONIC SIGNALS 

RELATED APPLICATIONS 

This patent application is a continuation in part of U.S. Patent Application 
5 Serial No. 09/905,361, ffled July 12, 2001, which is hereby incorporated by reference. 
This patent application also claims priority fiom U.S. Provisional Patent Application 
Serial No. 60/332,202, filed Novembo: 21, 200L 

FIELD OF THE INVENTION 

The invention is in the field of mathematical methods and electronic systems for 
10 removing or suppressing undesired acoustical noise fi:om acoustic transmissions or 
recordings. 

BACKGROUND 

In a typical acoustic application, speech fit>m a human user is recorded or stored 
and transmitted to a receiver in a different location. In the environment of the user, 

1 5 thra^ may exist one or more noise sources that pollute the signal of interest (the user's 
speech) with unwanted acoustic noise. Hiis makes it difScult or impossible for the 
receiver, whether human or machine, to understand &e user's speech. This is 
especially problematic now with the proliferation of portable communication devices 
like cellular telephones and personal digital assistants. There are existing methods for 

20 suppressing these noise additions, but they have significant disadvantages. For 

example, existing methods are slow because of ftie computing time required. Existing 
meOiods may also require cumbersome hardware, unacceptably distort the signal of 
interest, or have such poor performance tiiat they are not usefid. Many of tiiese existing 
methods are described in textbooks such as "Advanced Digital Signal Processing and 

25 Noise Reduction" by Vasegbi, ISBN 0-471-62692-9. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 is a block diagram of a denoising system, under an embodiment. 
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Figure 2 is a block diagram illustrating a noise removal algorithm, under an 
embodiment assuming a single noise source and a direct patih to the microphones. 

Figure 3 is a block diagram illustrating a front end of a noise removal algorithm 
of an embodiment generalized to n distinct noise sources (these noise sources may be 
5 reflections or echoes of one another). 

Figure 4 is a block diagram illustrating a front end of a noise removal algorithm 
of an embodiment in a general case where there are n distinct noise sources and signal 
reflections. 

Figure 5 is a flow diagram of a denoising method, imder an embodiment. 
1 0 Figure 6 shows results of a noise suppression algorithm of an embodiment for 

an American English female speaker in the presence of aiiport terminal noise that 
includes many other human speakers and public announcements. 

Figure 7 is a block diagram of a physical configuration for denoising using 
unidirectional and omnidirectional microphones, under the embodiments of Figures 2, 
15 3, and 4. 

Figure 8 is a denoising microphone configuration including two 
omnidirectional microphones, under an embodiment 

Figure 9 is a plot of the C required versus distance, under the embodiment of 
Figure 8. 

20 Figure 10 is a block diagram of a front end of a noise removal algorithm under 

an embodiment in which the two microphones have different response charactffllstics. 

Figure llA is a plot of tiie difference in frequency response (percent) between 
the miCTOphones (at a distance of 4 centimeters) before compensation. 

Figure IIB is a plot of the difference in frequency response (percent) between 
25 the microphones (at a distance of 4 centimeters) after DFT compensation, under an 
embodiment. 

Figure HC is a plot of the difference in frequency response (percent) between 
the microphones (at a distance of 4 centimeters) after time-domain filter compensation, 
under an alternate embo(£bnent 

30 

DETAILED DESCRIPTION 

The following description provides specific details for a thorou^ understanding 
of, and enabling description for, embodiments of the invention. However, one skilled 
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in the art will understand that the invention may be practiced without these details. In 
other instances, well-known structures and functions have not been shown or described 
in detail to avoid uimecessarily obscuring the description of the embodiments of the 
invention. 

5 Unless described otherwise below, the construction and operation of the various 

blocks shown in the figures are of conventional design. As a result, such blocks need 
not be described in further detail herein, because they will be understood by those 
skilled in the relevant art. Such fiirther detail is omitted for brevity and so as not to 
obscure the detailed description of the invention. Any modifications necessary to the 

1 0 blocks in the Figures (or other embodiments) can be readily made by one skilled in the 
relevant art based on the detailed description provided herein. 

Figure 1 is a block diagram of a denoising system of an embodiment that uses 
knowledge of when speech is occurring derived firom physiological information on 
voicing activity. The system includes microphones 10 and sensors 20 that provide 

15 signals to at least one processor 30. The processor includes a denoising subsystem or 
algorithm 40. 

Figure 2 is a block diagram illustrating a noise removal algorithm of an 
embodiment, showing system components used. A single noise source and a direct 
path to the microphones are assumed. Figure 2 includes a gr^hic description of the 

20 process of an embodiment, wift a single signal source 100 and a single noise source 
101. This algorithm uses two microphones: a "signal" microphone 1 CTMICl") and a 
' Wse'* microphone 2 C^MIC 2"), but is not so limited, MIC 1 is assumed to capture 
mostly signal with some noise, while MIC 2 captures mostty noise with some signal. 
The data from the signal source 100 to MIC 1 is denoted by s(n), where s(n) is a 

25 discrete sample of the analog signal from the source 1 00. The data from the signal 
source 100 to MIC 2 is denoted by S2(n). The data from the noise source 101 to MIC 2 
is denoted by n(n). The data from tihie noise source 101 to MIC 1 is denoted by njCn). 
Similarly, the data from MIC 1 to noise removal element 105 is denoted by mj(n), and 
the data from MIC 2 to noise removal element 105 is denoted by maCn). 

30 The noise removal element also receives a signal from a voice activity detection 

C'VAD'O element 104. The VAD 104 detects uses physiological information to 
detemaine when a speaker is speaking. In various embodiments, the VAD includes a 
radio frequmcy device, an electroglottograph, an ultrasound device, an acoustic throat 
ndicrophone, and/or an airflow detector. 
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nie transfer functions from the signal source 100 to MIC 1 and from the noise 
source 101 to MIC 2 are assumed to be unity. The transfer function from the signal 
source 1 00 to MIC 2 is denoted by H2(z), and the transfer function from the noise 
source 101 to MIC 1 is denoted by H,(z). The assumption of unity transfer functions 
5 does not inhibit the generaUty of this algorithm, as the actual relations between the 
signal, noise, and microphones are simply ratios and the ratios are redefined in this 
manner for simplicity. - - 

In conventional noise removal systems, the information from MIC 2 is used to 
attempt to remove noise from MIC 1 . However, an unspoken assumption is that the 

10 VAD element 104 is never perfect, and thus the draioising must be performed 

cautiously, so as not to remove too much of the signal along with the noise. However, 
if the VAD 1 04 is assumed to be perfect such that it is equal to zero when there is no 
speech being produced by the user, and equal to one when speech is produced, a 
substantial improvement in the noise removal can be made. 

15 In analyzing the single noise source 101 and the direct path to the microphones, 

with reference to Figure 2, the total acoustic information coming into MIC 1 is denoted 
by m,(n). The total acoustic information coming into MIC 2 is similarly labeled mjCn). 
In the z (digital frequency) domain, these are represented as Mi(z) and M^z). Then 

20 M,(z):=N(z)+S,(z) 
vfifh 

S,(z)=S(z)H,(z) 

so that 

25 M,(z)=S(z)+N(z)H,(z) 

M2(z)=N(z)-^S(z)H,(z) Eq. 1 

T^ is the general case for all two microphone systems. In a practical systan 
there is always going to be some leakage of noise into MIC 1, and some leakage of 
signal into MIC 2. Equation 1 has four unknowns and only two known relationships 
30 and therefore cannot be solved explidtly. 

However, there is another way to solve for some of the unknowns in Equation 1. 
The analysis starts with an examination of the case where the signal is not being 
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generated, that is, where a signal fix»m the VAD element 1 04 equals zero and speech is 
not being produced. In this case, s(n) = S(z) = 0, and Equation 1 reduces to 

M,„(z)=N(z)H/z) 
MJz)^N(z) 

5 where the n subscript on the M variables indicate that only noise is being received. 
Tbis leads to 

MJz)=MJz)H,(z) 

Hi(z) can be calculated using any of the available system identification 
1 0 algorithms and the microphone outputs when the system is certain that only noise is 
being received. The calculation can be done adaptively, so that the systooi can react to 
changes in the noise. 

A solution is now available for one of the unJcnowns in Equation 1 . Another 
unknown, H2(z), can be determined by using the instances where tiie VAD equals one 
15 and speech is being produced. When this is occurring, but the recent (perhaps less than 
1 second) history of the microphones indicate low levels of noise, it can be assumed 
thatn(s) = N(2)-0. Then Equation 1 reduces to 

MJz)^S(z) 
MJz)^S(z)H,(z) 

20 whidi in turn leads to 



M^(z)^MJz)H,(z) 



MJz) 



whidh is the inverse of the H,(z) calculation. However, it is noted that different inputs 
are bemg used - now only the signal is occurring whereas before only the noise was 
25 occuiring. While calculating HaCz), the values calculated for Hj(z) are held constant 
and vice versa. Thus, it is assumed fliat while one of H,(2) and HjCz) are being 
calculated, the one not being calculated does not change substantially. 
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After calculating Hi(z) and HjCz), they are used to ranove the noise fiom the 
signal. If Equation 1 is rewritten as 

S(z)^M,(z)-N(z)H,(z) 
N(z)^M,(z)-S(z)H,(z) 
S(z)=M,(z)-[M,(z)-S(z)H,(z)]Hj(z)' 
S(z)[l-H,(z)H,(z)]=M,(z)-M,(z)Hj(z) 

thai N(z) may be substituted as shovm to solve for S(z) as: 

l-H,(z)H,(z) ^'^ 

If the transfer functions Hi(z) and Hj(z) can be described with sufficient 
1 0 accuracy, thai the noise can be completely removed and the original signal recovered. 
This remains true without respect to the amplitude or spectral characteristics of the 
noise. The only assumptions made are a pofect VAD, sufficiently accurate H,(z) and 
HjCz), and that when one of Hi(2) and njiz) are being calculated the other does not 
change substantially. In practice these assumptions have proven reasonable. 
1 5 The noise removal algorithm described herein is easily generalized to include 

any nnmber of noise sources. Figure 3 is a blodc diagram of a fiont end of a noise 
removal algorithm of an embodiment, gen^alized to n distinct noise sources. These 
distinct noise sources may be reflections or echoes of one another, but are not so 
limited. TbscQ are several noise sources shown, each with a transfer function, or path^ 
20 to each microphone. The previously named path H2 has been relabeled as Hq, so that 
labeling noise source 2's path to MIC 1 is more convenient. The outputs of each 
microphone, when transformed to the z domain, are: 

M,(z)^S(z)'¥N,(z)H,(z)-¥N,(z)H,(z)^^...NJz^^ 
M2(z)^S(z)Ho(z)+Nj(z)Gj(z)+N,(z)G/z)+ Eq. 4 

25 When there is no signal (VAD = 0), then (suppressing the z's for clarity) 

Mj„ ^NjHj ^N,H, +. ..N„H^ 

■ M2n^Nfi,^Nfi,^,..N^G^ Eq. 5 

A new transfer function can now be defined, analogous to Hj(z) above: 
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Thus H I depends only on the noise soiirces and their respective transfer functions and 
can be calculated any time there is no signal being transmitted. Once again, the n 
subscripts on the microphone inputs denote only that noise is being detected, while an s 
subscript denotes that only signal is being received by the microphones. 

Examining Equation 4 while assuming that there is no noise produces 

Thus Ho can be solved for as before, using any available ttBnsfer function calculating 
10 algorithm. Mathematically 

Rewriting Equation 4, using H , defined in Equation 6, provides, 
Solving for S yields, 

which is the same as Equation 3, with Ho taking the place of H2, and H j taking the 
place of Hi. Thus the noise removal algorithm still is mathematically valid for any 
number of noise sources, including multiple echoes of noise sources. Again, if Hq and 
I can be estimated to a high enough accuracy, and the above assumption of only one 
20 padi from the signal to the microphones holds, the noise may be removed completely. 
The most general case involves midtiple noise sources and multiple signal 
sources. Figure 4 is a block diagram of a front end of a noise removal algorithm of an 
embodiment in the most general case where there are n distinct noise sources and signal 
reflections. Here, reflections ofthe signal enter both microphones. This is the most 
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general case, as reflections of tiie noise source into the microphones can be modeled 
accurately as simple additional noise sources. For clarity, the direct path from the 
signal to MIC 2 has changed from Ho(z) to Hoo(z), and the reflected paths to MIC 1 and 
MIC 2 are denoted by Hoi(2) and Hq2(z), respectively. 
5 The input into the microphones now becomes 

M,(z)^S(z)-^S(z)HJz)^N,(z)H,(z)^N,(z)H,(z)^ 
M,(z)^S(z)HJz)^S(z)H,,(z)^Nj(z)G,^^^^ Eq. 9 

When the VAD = 0, the inputs become (suppressing the "z" again) 

which is the same as Equation 5. Thus, the calculation of H, in Equation 6 is 

unchanged, as expected. In examining the situation where there is no noise, Equation 9 
reduces to 

15 M^=-SHoo^SHo2. 



This leads to the definition of H2 : 



E,. 10 



Rewriting Equation 9 again using the definition for Hj (as in Equatian 7) 
provides 



20 H - ^i-S(J+ffo,) 

Some algebraic manipulation yields 

. S(l+H„~Hj(H„+H^))=Mj~M,ffj 



Eq. 11 



S(l+Ho,)[l-H,H, ]=M, -M,H, 
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and finally 

Equation 12 is the same as equation 8, with the replacement of Hq by Hj , and 
the addition of the (I+Hqi) factor on the left side. This extra factor means that S cannot 
5 be solved for directly in this situation, but a solution can be generated for the signal 
plus the addition of all of its echoes. This is not such a bad situation, as there are many 
conventional methods for dealing with echo suppression, and even if the echoes are not 
si5)pressed, it is unlikely that they will affect the comprehensibihty of the speech to any 

meaningful extent. The more complex calculation of Hj is needed to account for the . 

10 signal echoes in MIC 2, which act as noise sources. 

Figure 5 is a flow diagram of a denoising method of an embodiment In 
operation, the acoustic signals are received 502. Further, ph3^iological information 
associated with human voicing activity is received 504. A first transfer function 
representative of the acoustic signal is calculated upon detennining that voicing 

15 information is absent from the acoustic signal for at least one specified period of time 
506. A second transfer function representative of the acoustic signal is calculated upon 
determining that voidng information is present in the acoustic signal for at least one 
specified period of time 508. Noise is removed fiom the acoustic signal using at least 
one combination of the first transfa" function and the second transfer function, 

20 producing denoised acoustic data streams 510. 

An algorithm for noise ranoval, or denoising algorithm, is described herein, 
fiiom the simplest case of a single noise source with a direct path to multiple noise 
sources with reflections and echoes. The algorithm has been shown herein to be viable 
under any envirotmiental conditions. The type and amount of noise are inconsequential 

25 if a good estimate has been made of H, and Hj, and ifone does not change 

substantially while the other is calculated. If the user environment is such that echoes 
are present, they can be compensated for if coming firom a noise source. If signal 
echoes are also present, they will affect the cleaned signal, but the effect should be 
negligible in most environments. 

30 In operation, the algorithm of an embodiment has shown excellent results in 

dealing with a variety of noise types, amplitudes, and orientations. However, there are 
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always approximations and adjustments that have to be made when moving from 
mathematical concepts to engineering applications. One assumption is made in 
Equation 3, where R^i^) is assumed small and therefore H/zJH/z) « 0, so lhat 
Eqiiation 3 reduces to 

5 S(z)^M,(z)-M,(z)H,(z). 

This means that only Hi(2) has to be calculated, speeding up the process and reducing 
the number of computations required considerably. With the proper selection of 
microphones, this approximation is easily realized. 

Another approximation involves the filter used in an embodiment. The actual 
1 0 Hi(z) will undoubtedly have both poles and zeros, but for stability and simplicity an all- 
zero Finite Impulse Response (FIR) filter is used, With enough taps (around 60) the 
^>proximation to the actual Hi(z) is very good. 

Regarding subband selection, the wider the range of fi-equendes over which a 
transfer function must be calculated, the more diflBcult it is to calculate it accurately. 
1 5 Therefore the acoustic data was divided into 1 6 subbands, with the lowest frequency at 
50 Hz and the higjiest at 3700. The denoising algorithm was then applied to each 
subband in turn, and flie 1 6 denoised data streams were recombined to yield the 
denoised acoustic data. This works very well, but any combinations of subbands (i.e. 
4, 6, 8, 32, equally spaced, perceptually spaced, etc.) can be used and has been found to 
20 woric as well. 

The amplitude of the noise was constrained in an embodiment so that the 
microphones used did not saturate (that is, operate outside a linear response region). It 
is important that the microphones opemte linearly to ensure the best performance. Even 
with this restriction, very low signal-to-noise ratio (SNR) signals can be denoised 
25 (down to -1 0 dB or less). 

The calculation of Hi(z) is accomplished every 10 milliseconds using the Least- 
Mean Squares (LMS) method, a common adaptive transfer function. An explanation 
may be found in "Adaptive Signal Processing" (1985), by Widrow and Steams, 
published by Prentice-Hall, ISBN 0-13-004029-0., 
30 The VAD for an embodiment is derived from a radio frequency sensor and the 

two microphones, yielding very high accuracy (>99%) for both voiced and unvoiced 
speech. The VAD of an embodiment uses a radio frequency (RF) iaterferometer to 
detect tissue motion associated with human speech production, but is not so limited. It 
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is therefore completely acx>ustic-noise free, and is able to function in any acoustic noise 
environment. A simple energy measxirement of the RF signal can be lased to determine 
if voiced speech is occurring. Unvoiced speech can be determined using conventional 
acoustic-based methods, by proximity to voiced sections determined using the RF 
5 sensor or similar voicing sensors, or through a combination of the above. Since there is 
much less en^gy in unvoiced speech, its activation accuracy is not as critical as voiced 
speech. 

With voiced and unvoiced speech detected reliably, the algorithm of an 
embodiment can be implemented. Once again, it is useful to repeat that the noise 

10 removal algorithm does not depend on how the VAD is obtained, only that it is 

accurate, especially for voiced speech. If speech is not detected and training occurs on 
the speech, the subsequent denoised acoustic data can be distorted. 

Data was collected in four chaimels, one for MIC 1, one for MIC 2, and two for 
the radio frequency sensor that detected the tissue motions associated with voiced 

15 speech. The data were sampled simultaneously at 40 kHz, then digitally filtered and 
decimated down to 8 kHz. The high sampling rate was used to reduce any aliasing that 
might result from the analog to digital process. A four-chaimel National Instruments 
A/D board was used along with Labview to capture and store the data. The data was 
then read into a C program and denoised 10 milliseconds at a time. 

20 Figure 6 shows results of a noise siQipression algorithm of an embodiment for 

an American English speaking female in the presence of airport terminal noise that 
includes many other human speakers and public annoimcements. The speaker is 
uttering the numbers 406-5562 in the midst of moderate airport terminal noise. The 
dirty acoustic data was denoised 10 milliseconds at a time, and before deaoising the 10 

25 milliseconds of data were prefiltered from 50 to 3700 Hz. A reduction in tibie noise of 
approximately 17 dB is evident. No post filtering was done on this sample; thus, all of 
the noise reduction realized is due to the algorithm of an eoibodiment It is clear that 
the algorithm adjusts to the noise instantiy, and is capable of removing the very difficult 
noise of other human speakers. Many different types of noise have all been tested with 

30 similar results, including street noise, helicopters, music, and sine waves, to name a 
few. Also; the orientation of tiie noise can be varied substantially without significantly 
changing the noise suppression performance. Finally, the distortion of tiie cleaned 
speedi is very low, ensuring good performance for speech recognition engines and 
human receivers alike. 
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The noise removal algorithm of an embodiment has been shown to be viable 
xmder any environmental conditions. The type and amount of noise are inconsequential 
if a good estimate has been made of Hj and H2 . If the user environment is such tihat 
echoes are present, they can be compensated for if coining from a noise source. If 
5 signal echoes are also present, they will afBsct the cleaned signal, but the effect should 
be negligible in most environments. 

Figure 7 is a block diagram of a physical configuration for denoising using a 
unidirectional microphone M2 for the noise and an omnidirectional naicaxjphone Ml for 
the speedi, under the embodiments of Figures 2, 3, and 4. As described above, the path 

1 0 fix>m the speech to the noise miarophone (MIC 2) is approximated as zero, and that 
approximation is realized through the careful placement of omnidirectional and 
unidirectional microphones. This works quite well (20-40 dB of noise suppression) 
when flie noise is oriented opposite the signal location (noise source Nj). However, 
when the noise source is oriented on the same side as the speako- (noise source Nj), the 

15 performance can drop to only 10-20 dB of noise suppression. This drop in st5)pression 
ability can be attributed to the steps taken to ensure that Hj is close to zero. These steps 
included the use of a unidirectional microphone for flie noise microphone (MIC 2) so 
that very Uttie signal is present in tiie noise data. As the unidirectional microphone 
cancels out acoustic information coming from a particular direction, it also cancels out 

20 noise that is coming j&om the same direction as speech. This may limit the ability of 
the adaptive algorithm to characterize and then remove noise in a location such as Nj. 
The same effect is noted when a unidirectional microphone is used for the speech 
n[iicrophone, Ml. 

However, if tiie unidirectional microphone M2 is replaced witii an 

25 omnidirectional microphone, then a significant amount of signal is captured by Mj. 
This runs counter to the aforementioned assumption that H2 is zero, and as a result 
during voicing a significant amount of signal is removed, resulting in denoising and 
"de-signaling". This is not acceptable if signal distortion is to be kept to a ininiTmmi. 
In order to reduce the distortion, tiaerefore, a value is calculated for Hj. However, the 

30 value for Hj can not be calculated in the presraice of noise, or the noise will be 
mislabeled as speech and not removed. 

Experience with acoustic-only microphone arrays suggests that a small, two- 
microphone array might be a solution to the problem. Figure 8 is a denoising 
microphone configuration including two omnidirectional microphones, under an 
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embodiment The same effect can be achieved through the iise of two unidirectional 
microphones, oriented in the same direction (toward the signal source). Yet another 
embodiment uses one unidirectional microphone and one onMiidirectional microphone. 
The idea is to capture similar information from acoustic sources in the direction of the 

5 signal source. The relative locations of the signal source and the two miorophones are 
fixed and known. By placing the microphones a distance d apart that corresponds with 
n discrete time samples and placing the speaker on the axis of the array, can be fixed 
to be of the form Cz'", where C is the difference in amphtude of the signal data at Mi 
and Mj. For the discussion that follows, the assumption is made that n - 1, although 

10 any integer other than zero may be used. For causality, the use of positive integers is 
recommended. As the amplitude of a spherical pressure source varies as l/r, this allows 
not only specification of the direction of the source but its distance. The C required can 
be estimated by 

\S\atM, d^d/ 

15 Figure 9 is a plot of the C required versus distance, imder the embodiment of 

Figures. It can be seen that the asymptote is at C= 1.0, and C reaches 0.9 at 
approximately 38 centimeters, slightly more than a foot, and 0.94 at approximately 60 
cm. At the distances noimaUy encoimtered in a handset and earpiece (4 to 12 cm), C 
would be between approximately 0.5 to 0.75. This is a difference of approximately 19 

20 to 44% with the noise source located at approximately 60 cm, and it is clear that most 
noise sources would be located farther away than that Therefore, the syston using this 
configuration would be able to discriminate between noise and signal quite effectively, 
even when they have a similar orientation. 

' To detemune the efiPects on denoising of poor estimates of C, assume that 

25 C = nCo, where C is an estimate and Co is the actual value of C. Using the signal 

definition from above. 
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it has been assxmied that HzCz) was very small, so that the signal coidd be approximated 
by 

S(z)«M,(z)-Mj(2)H,(z). 

5 This is trae if there is no speech, because by definition Hj = 0. However, if speech is 
occumng, is nonzero, and if set to be Cz'\ 



cM_ Mi(2)-M,(z)h,(z) 
1-Cz-H,(z) ' 



whidi can be rewritten as 



gM^ M,(2)^M,(z)H,(z) M,(z)-^M,(z)H,(2) 

l-iiCoZ-^H,(z) l-CoZ^^H,(z)+(l-n)CoZ'^H,(z)' 

10 The last factor in the deaiominator detennines the error due to the poor estimation of C. 
This factor is labeled E: 

E = (l-n)CoZ^^H,(z). 

Because z*^Hi(z) is a filter, its magnitude will always be positive. Therefore the change 
in calculated signal magnitude due to E will depend completely on (1-n). 

15 There are two possibilities for errors: underestimation of C (n < 1), and 

overestimation of C (n > 1). In the first case, C is estimated to be smaller tiiat it 
actually is, or the signal is closer than estimated. In this case (1-n) and therefore E is 
positive. The denominator is therefore too large, and the magnitude of the cleaned 
signal is too small. This would indicate de-signaMng. In the second case, the signal is 

20 farther away than estimated, and E is negative, making S larger than it should be. In 
this case the denoising is insufScient. Because very low signal distortion is desired, the 
estiuMtions should err toward overestimation of C. 

This result also shows that noise located in the same solid angle (direction fiom 
Ml) as the signal will be substantially removed depending on the change in C between 

25 the signal location and the noise location. Thus, when using a handset with Mj 
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approximately 4 cm from the mouth, the required C is approximately 0.5, and for noise 
at approximately 1 meter the C is approximately 0.96. Thus, for the noise, the estimate 
of C = 0.5 means that for the noise C is underestimated, and the noise will be removed. 
The amount of removal will depend directly on (1-n). Therefore, this algorithm uses 
5 the direction and the range to the signal to separate the signal from the noise. 

One issue that arises involves stability of this technique. Specifically, the 
deconvolution of (l-Rfi-^ raises the question of stability, as the need arises to calculate 
the iuverse of I-H1H2 at the beginning of each voiced segment This helps reduce the 
computing time, or number of instructions per cycle, needed to implement the 

10 algorithm, as there is no requirement to calculate the inverse for every voiced window, 
just the first one, as is considered to be constant. This approximation will make 
false positives more computationally expensive, however, by requiring a calculation of 
the inverse of l-HjHj evoy time a false positive is encountered. 

Fortunately, the choice of Hj eliminates the need for a deconvolution. From the 

15 discussion above, the signal can be written as 

oM- M.(z)-M,(z)H.(z) 
l-H,(z)H,(z) ' 

which can be rewritten as 

S(z)=M,(z)-M2(z)H,(z)+S(z)H3(z)H,(z). 

or 

20 S(z) = M,(z)-Hj(z)[M,(z)+S(z)H,(2)]. , 

However, since HjCz) is of the form Cz'\ the sequence in fhe time domain would look 
like 

s[n]=mi[n]-hj *[m2[n]-C-s[n-l]l, 

meaning that the present signal sample requires the present MIG 1 signal, ttie present 
25 MIC 2 signal, and the previous signal sample. This means that no deconvolution is 
needed, just a simple subtraction and then a convolution as before. The increase in 
computations required is minimal. Therefore this improvement is easy to implement 
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The effects of the difference in microphone response on this embodiment can be 
shown by examining the configurations described with reference to Figures 2, 3, and 4, 
only this time transfer functions A(z) and B(2) are included, which represent the 
fi-equency response of MIC 1 and MIC 2 along with their filtering and amplification 
5 responses. Figure 10 is a block diagram of a fi-ont end of a noise removal algorithm 
under an embodiment in which the two microphones MIC 1 and MIC 2 have different 
response characteristics. 

Figure 10 includes a graphic description of the process of an embodiment, with 
a single signal source 1000 and a single noise source 1001. This algorithm uses two 

10 microphones: a "signal" microphone 1 ('TVLICl") and a "noise" microphone 2 ("MIC 
2"), but is not so limited. MIC 1 is assumed to capture mostly signal with some noise, 
while MIC 2 captures mostly noise with some signal. The data firom the signal source 
1000 to MIC 1 is denoted by s(n), where s(n) is a discrete sample of the analog signal 
fix)m the source 1000. The data firom the signal source 1000 to MIC 2 is denoted by 

15 SjCn). The data from the noise source 1001 to MIC 2 is denoted by n(n). The data from 
the noise source 1001 to MIC 1 is denoted by n2(n), 

A transfer functions A(2) represents the frequency response of MIC 1 along 
with its filtering and amplification responses. A transfer function B(z) represents the 
frequency response of MIC 2 along with its filtering and amplification responses. The 

20 output of the transfer function A(z) is denoted by m,(n), and the output of the transfer 
function B(z) is denoted by mjCn). The signal mi(n) and mjCn) are received by a noise 
removal element 1005, which operates on the signals and outputs "cleaned speech*'. 

Hereafter, the term **frequency response of MIC X** will include the combined . 
effects of the microphone and any amplification or filtering processes that occur during 

25 flie data recording process for that microphone. When solving for the signal and noise 
(suppressing "z*' for clarity), 

A 

B ^ 

wherein substituting the latter into the former produces 
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A B • ' 
l-HjHj 

which seems to indicate that the difiPerences in frequency response (between MIC 1 and 
MIC 2) have an impact. However, what is being measiired has to be noted. Formerly 
(before taking the frequency response of the microphones into account), H, was 
5 measured using 

where the n subscripts indicate that this calculation only occurs during windows that 
contain only noise. Howevo*, when examining the equations, it is noted that when 
there is no signal the following is measured at the microphones: 



10 



therefore H, should be calculated as 



Ml =H,NA 
Mj =NB 



H =^ 



' AM^ 

However, B(z) and A(z) are not taken into account when calculating Hi(z). 
Therefore what is actually measured is just the ratio of &e signals in each microphone: 

15 H,=^-H,-, 

where repres^ts the measured response and the actual response. The calculation 
for H2 is analogous, and results in 

^A 
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Substituting H, and H2 back into the equation for S above produces 

Ml BH.M^ 
A A B 



or 

I-H1H2 ' 

5 which is the same as before, when, the frequency response of the microphones was not 
included. H^e S(z)A(z) takes the place of S(zX and the values (H, (z) and (z)) take 
the place of the actual Hi(z) and HjCz). Thus, this algorithm is, in theory, independent 
of the microphone and associated filter and amplifier response. 

However, in practice, it is assumed that H2 = Cz ^ (where C is a constant), but it 
10 is actually 

A 

so the result is 

S^>^- M.-j5,M, ^ 

I--H1CZ-' ' 
A 

which is dependent on B(z) and A(z), which are not known. This can cause problems if 
15 the fi-equency response of the microphones is substantially different, which is a 

common occurrence, especially for tiie inexpensive microphones fi-equently used. This 
means that the data from MIC 2 should be compensated so that it has the proper 
relationship to the data coming from MICl . This can be done by recording a broadband 
signal in both MIC 1 and MIC 2 from a source that is located at the distance and 
20 orientation expected for the actual signal (the actual signal source could also be used). 
A discrete Fourier transform (DFT) for each microphone signal is then calculated, and 
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the magnitude of the transform at each frequency bin is calculated. The magnitude of 
the DFT for MIC 2 in each frequency bin is then set to be equal to C multiplied by the 
magnitude of the DFT for MIC 1. If MJn] represents the n*^ frequency bin magnitude 
of the DFT for MIC 1, then the factor that is multiplied by MjLn] would be 

The inverse transform is then stpplied to the new MIC 2 DFT amplitude, using tihie 
previous MIC 2 DFT phase. In this manner, MIC 2 is resynthesized so that the 
relationship 

M2(z)=Mi(z)-Cz-' 

10 is correct for the times when only speech is occurring. This transformation could also 
be performed in the time domain, usmg a filter that would emulate the properties of F 
as closely as possible (for example, the Matlab function FFT2.M could be used with the 
calculated values of F[n] to construct a suitable FIR filter). 

Figure llA is a plot of tihe diflference in frequenqr response (percent) between 

IS the microphones (at a distance of 4 centimeters) before compensation. Figure IIB is a 
plot of the difference in frequency response (percent) between the microphones (at a 
distance of 4 centimeters) after DFT compensation. Figure IIC is a plot of the 
difference in frequency response (percent) between the microphones (at a distance of 4 
centimeters) after time-domain filter compensation. These plots show the effectiveness 

20 of the compensation methods described above. Thus, using two very inexpensive 

omnidirectional or unidirectional microphones, botti compensation methods restore the 
correct relationship between the miorophones. 

The transformation should be relatively constant as long as the relative 
amplifications and filtering processes are unchanged. Thus, it is possible that the 

25 compensation process would only need to be performed once at the manu&cturing 
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Stage. However, if need be, the algorithm could be set to operate assuming Hj = 0 until 
the system was used in a place with very little noise and strong signal. Then the 
compmsation coefBdents F[n] could be calculated and used fiom that time on. Since 
denoising is not required when there is very little noise, this calculation would not 
5 impose imdue strain on the denoising algorithm. The denoising coeflBcients could also 
be updated any time the noise environment is favorable for maximum accuracy. 

Each of the blocks and steps depicted in the figures presented herein can each 
include a sequanice of operations that need not be described herein. Those skilled in the 
relevant art can create routines, algorithms, source code, microcode, program logic 

1 0 arrays or otherwise implement the invention based on the figures and the detailed 
description provided herein. The routines described herein can include any of the 
following, or one or more combinations of the following: a routine stored in non- 
volatile memory (not shown) that forms part of an associated processor or processors; a 
routine implemented using conventional programmed logic arrays or drcuit elements; a 

15 routine stored in removable media such as disks; a routine downloaded firom a server 
and stored locally at a client; and a routine hardwired or preprogrammed in chips such 
as electrically erasable programmable read only memory ("EEPROM") semiconductor 
chips, application specific integrated circuits (ASICs), or by digital signal processing 
(DSP) integrated circuits. 

20 Unless the context clearly requires otherwise, thit)ughout the description and the 

claims, the words "comprise," "comprising," and the like are to be constraed in an 
inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense 
of '^including, but not limited to." Words using the singular or plural number also 
include the plural or singular number respectively. Additionally, the words 'lierein," 

25 '^hereunder," and words of similar import, when used in this application, shall refer to 
this application as a whole and not to any particular portions of this application. 
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The above description of illustrated embodiments of the invention is not 
intended to be exhaustive or to limit the invention to the precise form disclosed. While 
specific embodiments of^ and examples for, the invention are described herein for 
illustrative purposes, various equivalent modifications are possible within the scope of 
S the invention, as those skilled in the relevant art will recognize. The teachings of the 
invention provided herein can be applied to other machine vision systems, not only for 
the data collection symbology reader described above. Further, the elements and acts 
of the various embodimmts described above can be combined to provide fbrth^ 
embodiments. 

10 Any references or U.S. patent applications referenced herein are incorporated 

herein by reference. Aspects of the invention can be modified, if necessary, to employ 
the systems, functions and concepts of these various refermces to provide yet fiarfher 
embodiments of the invention. 
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CLAIMS 

What is claimed is: 



1 1. A method for removing noise from electromcsi^ 

2 receiving a plurality of acoustic signals in a first receiving device; 

3 receiving a plurality of acoustic signals in a second receiving device, wherein 

4 the plurality of acoustic signals include at least one noise signal gmerated by at least 

5 one noise source and at least one voice signal generated by at least one signal source, 

6 wherein the at least one signal source comprises a human speaker, and wherein relative 

7 locations of the signal source, the first receiving device, and the second receiving 

8 device are fixed and known; 

9 receiving physiological iirformation associated with human voicing activity of 

10 the human speaker, including whether voice activity is present; 

1 1 generating at least one first transfer function represmtative of the plurality of 

1 2 acoustic noise signals upon detemiining that voicing activity is absait from the 

13 plurality of acoustic signals for at least one specified period; 

14 generating at least one second transfer fimction representative of the plurality of 

1 5 acoustic signals upon determining that voicing information is present in the plurality of 

16 acoustic signals for the at least one specified period of time; and 

17 removing noise from the plurality of acoustic signals using at least one 

1 8 combination of the at least one first transfer function and the at least one second 

1 9 transfer fimction to produce at least one denoised data stream. 

1 2. The method of claim 1 , wherein the first receiving device and the second 

2 receiving device each comprise a microphone selected from a group comprising 

3 unidirectional mio^ophones and unidirectional microphones. 

1 3 . The method of claim 1 , wherein the plurality of acoustic signals are 

2 received in discrete time samples, and wherein the first receiving device and the second 

3 receiving device are located a distance "d'* apart, whereia d corresponds to n disarete 

4 time samples 

1 4. The method of claim 1 , wherein the at least one second transfer function 

2 is fixed as a fimction of a difference in amplitude of signal data at the first receiving 

3 device and the amplitude of signal data at the second receiving device. 
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1 5. The metiiod of claim 1 , wherein removing noise fix)m flie plurality of 

2 acoustic signals includes using a direction and a range to the at least one signal source 

3 from the at least one first receiving device. 

1 6. The method of claim 1 , wherein respective frequency responses of the at 

2 least one first receiving device and the second at least one receiving device are 

3 diffCTent, and wherein the signal data from the at least one second receiving device is 

4 compensated to have a proper relationship to signal data from the at least one first 

5 receiving device. 

1 7. The method of claim 6, wherein compensating the signal data from the 

2 at least one second receiving device comprises recording a broadband signal in the at 

3 least one first receiving device and the at least one second receiving device from a 

4 source located at a distance and an orientation expected for a signal from the at least 

5 one signal source. 

1 8. The method of claim 6, wherein compensating the signal data from the 

2 at least one second receiving device comprises frequency domain compensation. 

1 9. The method of claim 8, wherein frequency compensation comprises: 

2 calculating a frequency transform for signal data from each of the at least one 

3 first receiving device and the at least one second receiving device signal is calculated; 

4 calculating a magnitude of the frequency transform at each frequency bin; and 

5 setting a magnitude of the frequency transform for the signal data 6om the at 

6 least one second receiving device in each frequency to a value related to a magnitude of 

7 the frequency transform for the signal data from the at least one first receiving device. 

1 10. The method of claim 6, wherein compensating the signal data fiDm the 

2 at least one second receiving device comprises time domain compensation. 

1 11. The method of claim 6, fiirther comprising: 

2 initially setting the at least one second transfer fimction to zero; and 

3 calculating compensation coefGcients at times when there the at least one noise 

4 signal is small relative to the at least one voice signal. 

1 12. The method of claim 1, wherein the plurality of acoustic signals include 
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2 at least one reflection of the at least one noise signal and at least one reflection of the at 

3 least one voice signal. 

1 13. The method of claim 1 , wherein receiving physiological information 

2 comprises receiving physiological data associated with human voicing using at least 

3 one detector selected from a group consisting of acoustic microphones, radio frequency 

4 devices, electroglottographs, ultrasound devices, acoustic throat nodcrophones, and 

5 airflow detectors. 

1 14. The method of claim 1 wherein generating the at least one first transfer 

2 function and the at least one second transfer function comprises use of at least one 

3 technique selected from a group comprising adaptive techniques and recursive 

4 techniques. 

1 1 5. A system for removing noise from acoustic signals, comprising: 

2 at least one receiver comprising, 

3 at least one signal receiver configured to receive at least one acoustic 

4 signal from a signal source; and 

5 at least one noise receiver configured to receive at least one noise signal 

6 from a noise source, wherein relative locations of the signal source, the at lease one 

7 signal receiver, and the at least one noise receiver are fixed and known; 

8 at least one sensor that receives physiological information associated with 

9 human voicing activity; and 

10 at least one processor coupled among the at least one receiver and the at least 



11 one sensor that generates a plurality of transf^ frinctions, wherein at least one first 

12 transfer function representative of the at least one acoustic signal is generated in 

1 3 response to a determination that voicing information is absent from the at least one 

1 4 acoustic signal for at least one specified period of time, wherein at least one second 

1 5 transfer function representative of the at least one acoustic signal is generated in 

1 6 response to a determination that voicing information is presoat in the at least one 

1 7 acoustic signal for at least one specified period of time, wherem noise is removed from 

18 the at least one acoustic signal usinig at least one combination of the at least one first ^ 

1 9 transfer function and the at least one second transfer function. 
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1 16. The system of claim 1 5, wherem the at least one sensor includes at least 

2 one radio frequency C*RF') interferometer that detects tissue motion associated with 

3 human speech. 

1 17. The system of claim 1 5, wherein the at least one sensor includes at least 

2 one sensor selected from a group consisting of acoustic miorophones, radio frequency 

3 devices, electroglottographs, ultrasound devices, acoustic throat microphones, and 
'4 airflow detectors. 

1 18. The system of claim 1 5, wherein the at least one processor is configured 

2 to: 

3 divide acoustic data of the at least one acoustic signal into a plurality of 

4 subbands; 

5 remove noise from each of the plurality of subbands using the at least one 

6 combination of the at least one first transfer function and the at least one second 

7 transfer ftmction, whorein a plurality of denoised acoustic data streams are generated; 

8 and 

9 combine the plurality of denoised acoustic data streams to generate the at least 
1 0 one denoised acoustic data stream. 

1 19. The system of claim 15, wherein the at least one signal receiver and the 

2 at least one noise receiver are each microphones selected from a group comprising 

3 unidirectional microphones and omnidirectional microphones. 

1 20. A signal processing system coiqpled among at least one user and at least 

2 one electronic device, the signal processing system comprising: 

3 at least one first receiving device configured to receive at least one acoustic 

4 signal firom a signal source; 

5 at least one second receiving device configured to receive at least one noise 

6 signal from a noise source, wherein relative locations of the signal source, the at least 

7 one first receiving device, and the at least one second receiving device are fixed and 
• 8 known; and • - • 

9 at least one denoising subsystem for removing noise fix>m acoustic signals, the 

1 0 denoising subsystem comprising: 
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11 at least one processor coupled among the at least one jBrst receiver and 

12 the at least one second receiver; and 

13 at least one sensor coupled to the at least one processor, wherein the at 

14 least one sensor is configures to receive physiological infonnation associated with 

1 5 human voicing activity, wherein the at least one processor generates a plurality of 

1 6 transfer functions, wherein at least one first transfer function representative of the at 

1 7 least one acoustic signal is generated in response to a detomination that voicing 

1 8 information is absent &om the at least one acoustic signal for at least one specified 

19 period of time, wherein at least one second transfer function representative of the at 

20 least one acoustic signal is generated in response to a determination that voicing 

21 infonnation is present m the at least one acoustic signal for at least one specified period 

22 of time, wherein noise is removed fi-om the at least one acoustic signal using at least 

23 one combination of the at least one first transfer function and the at least one second 

24 transfer function to produce at least one denoised data stream. 

1 21 . The signal processing system of claim 20, wherdn the first receiving 

2 device and the second receiving device are each microphones selected from a group 

3 comprising unidirectional microphones and onmidirectional microphones. 

1 22. The signal processing system of claim 20, wherein the at least one 

2 acoustic signal is received in discrete time samples, and wherein the first receiving 

3 device and the second receiving device are located a distance "d" apart, wherein d 

4 corresponds to n discrete time samples 

1 23 . The signal processing system of claim 20, wherein the at least one 

2 second transfer function is fixed as a fimction of a diflference in amplitude of signal data 

3 at the first receiving device and the amplitude of signal data at the second receiving 

4 device. 

1 24. The signal processing system ofclaim 20, wherein removing noise from 

2 the at least one acoustic signal includes using a direction and a range to the at least one 

3 signal source from the at least one first receiving device. 

1 25. The signal processing system of claim 20, wherein respective frequency 

2 responses of the at least one first receiving device and the second at least one receiving 

3 device are different, and wherem the signal data &om the at least one second receiving 
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4 device is compensated to have a proper relationship to signal data from the at least one 

5 first receiving device. 

1 26. The signal processing system ofclaim 25, wherein compensating the 

2 signal data from the at least one second receiving device comprises recording a 

3 broadband signal in the at least one first receiving device and the at least one second 

4 receiving device from a source located at a distance and an orientation expected for a 

5 signal from the at least one signal source. 

1 27. The signal processing system of claim 25, wherein compensating the 

2 signal data from the at least one second receiving device comprises frequency domain 

3 con:q>ensaiion. 

1 28. The signal processing system of claim 27, wherein frequency 

2 compensation comprises: 

3 calculating a frequency transform for signal data from each of the at least one 

4 first receiving device and the at least one second receiving device signal is calculated; 

5 calculating a magnitude of the frequency transform at each frequency bin; and 

6 setting a magnitude of the frequency transform for the signal data from the at 

7 least one second receiving device in each frequency to a value related to a magnitude of 

8 the frequency transform for the signal data from the at least one first receiving device. 

1 29. The signal processing system of claim 25, wherein compensating the 

2 signal data fix>m the at least one second receiving device comprises time domain 

3 compensation. 

1 30. The signal processing system of claitri 25, further compensating fiarfher 

2 comprises: 

3 initially setting Uie at least one second transfer frmction to zero; and 

4 calculating compensation coefQdents at times when there the at least one noise 

5 signal is small relative to the at least one acoustic signal. 

1 31. The signal processing system of claim 20, wherein the at least one 

2 acoustic signal includes at least one reflection of the at least one noise signal and at 

3 least one reflection of the at least one acoustic signal. 
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1 32. The signal processing system ofclaim 20, wherein 

2 physiological information comprises receiving physiological data associated with 

3 human voicing using at least one detector selected from a group consisting of acoustic 

4 noicrophones, radio frequency devices, electroglottogr25)hs, ultrasound devices, acoustic 

5 throat microphones, and airflow detectors. 

1 33. The signal processing system of claim 20 wherein generating the at least 

2 one first transfer function and the at least one second transfer fimction comprises use of 

3 at least one technique selected from a group comprising adaptive techniques and 

4 recursive techniques. 
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